Search CORE

49 research outputs found

Transparent pointer compression for linked data structures

Author: Chris Lattner
Vikram S. Adve
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

64-bit address spaces are increasingly important for modern applications, but they come at a price: pointers use twice as much memory, reducing the effective cache capacity and memory bandwidth of the system (compared to 32-bit ad-dress spaces). This paper presents a sophisticated, auto-matic transformation that shrinks pointers from 64-bits to 32-bits. The approach is “macroscopic, ” i.e., it operates on an entire logical data structure in the program at a time. It allows an individual data structure instance or even a subset thereof to grow up to 232 bytes in size, and can compress pointers to some data structures but not others. Together, these properties allow efficient usage of a large (64-bit) ad-dress space. We also describe (but have not implemented) a dynamic version of the technique that can transparently expand the pointers in an individual data structure if it ex-ceeds the 4GB limit. For a collection of pointer-intensive benchmarks, we show that the transformation reduces peak heap sizes substantially by (20 % to 2x) for several of these benchmarks, and improves overall performance significantly in some cases

CiteSeerX

Crossref

Less is More: Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

Author: ARM.
ARM.
ARM.
Bodin François
GCC
Georgiou K.
Hollis Simon
Lattner Chris
Ogilvie W. F.
Pallister James
Pallister James
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/02/2018
Field of study

This paper presents the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -O2, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. This observation has been validated on two embedded processors, namely the ARM Cortex-M0 and the ARM Cortex-M3, using two different versions of the LLVM compilation framework; v3.8 and v5.0. Experimental evaluation with 71 embedded benchmarks demonstrated performance gains for at least half of the benchmarks for both processors. An average execution time reduction of 2.4% and 5.3% was achieved across all the benchmarks for the Cortex-M0 and Cortex-M3 processors, respectively, with execution time improvements ranging from 1% up to 90% over the -O2. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compilation approaches that use iterative compilation or machine learning to select flags or to determine phase orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, our approach can support multi-criteria optimization as it targets execution time, energy consumption and code size at the same time.Comment: 15 pages, 3 figures, 71 benchmarks used for evaluatio

arXiv.org e-Print Archive

Crossref

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Explore Bristol Research

ProbeGuard:Mitigating Probing Attacks Through Reactive Program Transformations

Author: Backes Michael
Davi Lucas
Kuznetsov Volodymyr
Lattner Chris
Williams-King David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/04/2019
Field of study

Many modern defenses against code reuse rely on hiding sensitive data such as shadow stacks in a huge memory address space. While much more efficient than traditional integritybased defenses, these solutions are vulnerable to probing attacks which quickly locate the hidden data and compromise security. This has led researchers to question the value of information hiding in real-world software security. Instead, we argue that such a limitation is not fundamental and that information hiding and integrity-based defenses are two extremes of a continuous spectrum of solutions. We propose a solution, ProbeGuard, that automatically balances performance and security by deploying an existing information hiding based baseline defense and then incrementally moving to more powerful integrity-based defenses by hotpatching when probing attacks occur. ProbeGuard is efficient, provides strong security, and gracefully trades off performance upon encountering more probing primitives

VU Research Portal

Crossref

Test Case Permutation to Improve Execution Time

Author: Beyls Kristof
Bradley Chen J
Lattner Chris
Pyo C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/10/2016
Field of study

Crossref

Edinburgh Research Explorer

Compiler-Assisted Test Acceleration on GPUs for Embedded Software

Author: Barbosa Jacson Rodrigues
Kushneryk Colin JW
Lattner Chris
Yu Zhao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/07/2017
Field of study

Crossref

Edinburgh Research Explorer

Achieving High-Performance the Functional Way: A Functional Pearl on Expressing High-Performance Optimizations as Rewrite Strategies

Author: Boyle James M
Chakravarty Manuel M. T.
Chen Tianqi
Collins Alexander
Delahaye David
Hall Mary
Henriksen Troels
Jones Simon Peyton
Lattner Chris
McDonell Trevor L.
Ragan-Kelley Jonathan
Steuwer Michel
Steuwer Michel
Svensson Joel
Visser Eelco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend towards specialized hardware devices to further increase efficiency. Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. Using a high-level - often functional - language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad-hoc predefined APIs that the compiler writers have exposed. In this functional pearl, we show how to employ functional programming techniques to solve this challenge with elegance. We present two functional languages that work together - each addressing a separate concern. RISE is a functional language for expressing computations using well known functional data-parallel patterns. ELEVATE is a functional language for describing optimization strategies. A high-level RISE program is transformed into a low-level form using optimization strategies written in ELEVATE . From the rewritten low-level program high-performance parallel code is automatically generated. In contrast to existing high-performance domain-specific systems with scheduling APIs, in our approach programmers are not restricted to a set of built-in operations and optimizations but freely define their own computational patterns in RISE and optimization strategies in ELEVATE in a composable and reusable way. We show how our holistic functional approach achieves competitive performance with the state-of-the-art imperative systems Halide and TVM

Crossref

Edinburgh Research Explorer

Enlighten

LLHD: A Multi-level Intermediate Representation for Hardware Description Languages

Author: Clarke Edmund
Gupta Aarti
Johnson Troy A
Lattner Chris
Mattarei Cristian
Novillo Diego
Rustin Randall
Steele Guy Lewis
Sutherland Stuart
Wang Sheng-Hong
Wang Sheng-Hong
Publication venue
Publication date: 07/04/2020
Field of study

Modern Hardware Description Languages (HDLs) such as SystemVerilog or VHDL are, due to their sheer complexity, insufficient to transport designs through modern circuit design flows. Instead, each design automation tool lowers HDLs to its own Intermediate Representation (IR). These tools are monolithic and mostly proprietary, disagree in their implementation of HDLs, and while many redundant IRs exists, no IR today can be used through the entire circuit design flow. To solve this problem, we propose the LLHD multi-level IR. LLHD is designed as simple, unambiguous reference description of a digital circuit, yet fully captures existing HDLs. We show this with our reference compiler on designs as complex as full CPU cores. LLHD comes with lowering passes to a hardware-near structural IR, which readily integrates with existing tools. LLHD establishes the basis for innovation in HDLs and tools without redundant compilers or disjoint IRs. For instance, we implement an LLHD simulator that runs up to 2.4x faster than commercial simulators but produces equivalent, cycle-accurate results. An initial vertically-integrated research prototype is capable of representing all levels of the IR, implements lowering from the behavioural to the structural IR, and covers a sufficient subset of SystemVerilog to support a full CPU design

arXiv.org e-Print Archive

Crossref

Automatic Pool Allocation: Compile-Time Control of Data Structure Layout in the Heap

Author: Adve Vikram S.
Lattner Chris A.
Publication venue
Publication date: 01/07/2004
Field of study

Despite the potential importance of data structure layouts and traversal patterns, compiler transformations on pointer-intensive programs are performed primarily using pointer analysis, and not by controlling and using information about the layout of high-level data structures. This paper describes a compiler transformation called \emph{Automatic Pool Allocation} that segregates instances of ``logical'' data structures in the heap into distinct pools, and allows different heuristics to be used to partially control the internal layout of those data structures. Because these are rigorous transformations, their results, combined with pointer analysis information, can be used to perform further compiler analyses and transformations, and we briefly list a few examples. Automatic Pool Allocation also provides several direct performance benefits for pointer intensive programs, most importantly, that traversals of a logical data structure allocated to a separate pool can have better spatial locality and smaller working sets. We evaluate the performance and cache behavior of the code transformed by the automatic pool allocation transformation on a series of heap-intensive and general-purpose benchmarks, and find that it speeds up several C programs by 10-40\% percent or more, and does not hurt (or help) other programs

Illinois Digital Environment for Access to Learning and Scholarship Repository

Macroscopic Data Structure Analysis and Optimization

Author: Lattner Chris A.
Publication venue
Publication date: 01/01/2005
Field of study

Providing high performance for pointer-intensive programs on modern architectures is an increasingly difficult problem for compilers. Pointer-intensive programs are often bound by memory latency and cache performance, but traditional approaches to these problems usually fail: Pointer-intensive programs are often highly-irregular and the compiler has little control over the layout of heap allocated objects. This thesis presents a new class of techniques named ``Macroscopic Data Structure Analyses and Optimizations'', which is a new approach to the problem of analyzing and optimizing pointer-intensive programs. Instead of analyzing individual load/store operations or structure definitions, this approach identifies, analyzes, and transforms entire memory structures as a unit. The foundation of the approach is an analysis named Data Structure Analysis and a transformation named Automatic Pool Allocation. Data Structure Analysis is a context-sensitive pointer analysis which identifies data structures on the heap and their important properties (such as type safety). Automatic Pool Allocation uses the results of Data Structure Analysis to segregate dynamically allocated objects on the heap, giving control over the layout of the data structure in memory to the compiler. Based on these two foundation techniques, this thesis describes several performance improving optimizations for pointer-intensive programs. First, Automatic Pool Allocation itself provides important locality improvements for the program. Once the program is pool allocated, several pool-specific optimizations can be performed to reduce inter-object padding and pool overhead. Second, we describe an aggressive technique, Automatic Pointer Compression, which reduces the size of pointers on 64-bit targets to 32-bits or less, increasing effective cache capacity and memory bandwidth for pointer-intensive programs. This thesis describes the approach, analysis, and transformation of programs with macroscopic techniques, and evaluates the net performance impact of the transformations. Finally, it describes a large class of potential applications for the work in fields such as heap safety and reliability, program understanding, distributed computing, and static garbage collection

CiteSeerX

Illinois Digital Environment for Access to Learning and Scholarship Repository

LLVM: An Infrastructure for Multi-Stage Optimization

Author: Chris Arthur Lattner
Publication venue
Publication date: 01/01/2002
Field of study

Modern programming languages and software engineering principles are causing increasing problems for compiler systems. Traditional approaches, which use a simple compile-link-execute model, are unable to provide adequate application performance under the demands of the new conditions. Traditional approaches to interprocedural and profile-driven compilation can provide the application performance needed, but require infeasible amounts of compilation time to build the application. This thesis presents LLVM, a design and implementation of a compiler infrastructure which supports a unique multi-stage optimization system. This system is designed to support extensive interprocedural and profile-driven optimizations, while being efficient enough for use in commercial compiler systems. The LLVM virtual instruction set is the glue that holds the system together. It is a low-level representation, but with high-level type information. This provides the benefits of a low-level representation (compact representation, wide variety of available transformations, etc.) as well as providing high-level information to support aggressive interprocedural optimizations at link- and post-link time. In particular, this system is designed to support optimization in the field, both at run-time and during otherwise unused idle time on the machine. This thesis also describes an implementation of this compiler design, the LLVM compiler infrastructure, proving that the design is feasible. The LLVM compiler infrastructure is a maturing and efficient system, which we show is a good host for a variety of research. More information about LLVM can be found on its web site at: http://llvm.cs.uiuc.edu

CiteSeerX